# Multimodal feature extraction
CLIP ViT L Rho50 K1 Constrained FARE2
MIT
A feature extraction model fine-tuned based on openai/clip-vit-large-patch14, optimizing the image and text encoders
Multimodal Fusion
Transformers

C
LEAF-CLIP
253
0
Moonvit SO 400M
MIT
MoonViT is a native resolution visual encoder, initialized and continuously pre-trained based on SigLIP-SO-400M, suitable for image feature extraction tasks.
Image Enhancement
Transformers

M
moonshotai
275
12
Vit Large Patch14 Clip 224.dfn2b
Other
A vision transformer model based on the CLIP architecture, focused on image feature extraction, released by Apple.
Image Classification
Transformers

V
timm
178
0
Featured Recommended AI Models